lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (2216B)


      1 +++
      2 title = 'Branch delays'
      3 +++
      4 # Branch delays
      5 ## Unconditional branches
      6 
      7 the two instructions that are fetched during decode and compute of first instruction (a branch) have to be discarded
      8 
      9 this two cycle-delay — “branch penalty”
     10 
     11 ![screenshot.png](screenshot-40.png)
     12 
     13 to reduce the penalty, branch target address must be computed earlier than pipeline — in the decode stage
     14 
     15 this reduces the penalty to one cycle:
     16 
     17 ![screenshot.png](screenshot-41.png)
     18 
     19 this needs hardware modification — PC has to be incremented in every cycle, and a second adder is needed in decode stage to compute branch target address for every instruction
     20 
     21 ## Conditional branches
     22 branch condition must be tested as early as possible
     23 comparator to test condition can be moved to decode stage
     24 it would use values from register file outputs A and B directly
     25 
     26 ## Branch delay slot — compiler reorganises instructions
     27 branch delays slot — the location that follows a branch instruction
     28 
     29 compiler tries to find an instruction that it always executed, independent of whether or not the program branches
     30 
     31 data dependencies must be preserved
     32 if the compiler can find a useful instruction, there’s no branch penalty
     33 otherwise, it NOPs out and there’s a penalty of one cycle
     34 
     35 ![screenshot.png](screenshot-42.png)
     36 
     37 ## Branch prediction
     38 Static:
     39 
     40 - assume branch will not be taken, fetch next instruction in sequential order
     41 - simple, decent accuracy
     42 - processor can determine static prediction by checking sign of branch offset
     43 - other option — encoding of branch instruction can include a bit indicating whether prediction should be ‘take’ or ‘not taken’ (set by compiler)
     44 
     45 Dynamic:
     46 
     47 - use actual branch behaviour
     48 - processor assumes that next time an instruction is executed, branch decision is likely to be the same as last time
     49 - keep more information about execution history, such as four states (strongly taken, likely taken, likely not taken, strongly not taken)
     50 - keep a branch target buffer
     51     - identifies branch instructions by their addresses
     52     - in the form of lookup table
     53     - contains: address of branch instruction, one/two state bits for branch prediction algorithm (outcome), branch target address